Min-Uncertainty & Max-Certainty Criteria of Neighborhood Rough- Mutual Feature Selection
نویسندگان
چکیده
Feature Selection (FS) is viewed as an important preprocessing step for pattern recognition, machine learning, and data mining. Most existing FS methods based on rough set theory use the dependency function for evaluating the goodness of a feature subset. However, these FS methods may unsuccessfully be applied on dataset with noise, which determine only information from a positive region but neglect a boundary region. This paper proposes a criterion of the maximal lower approximation information (Max-Certainty) and minimal boundary region information (Min-Uncertainty), based on neighborhood rough set and mutual information for evaluating the goodness of a feature subset. We combine this proposed criterion with neighborhood rough set, which is directly applicable to numerical and heterogeneous features, without involving a discretization of numerical features. Comparing it with the rough set based approaches, our proposed method improves accuracy over various experimental data sets. Experimental results illustrate that much valuable information can be extracted by using this idea. This proposed technique is demonstrated on discrete, continuous, and heterogeneous data, and is compared with other FS methods in terms of subset size and classification accuracy.
منابع مشابه
An Efficient Gene Selection Technique based on Fuzzy C-means and Neighborhood Rough Set
Selecting genes from microarray gene expression datasets has become an important research, because such data typically consist of a large number of genes and a small number of samples. Avoiding information loss, neighborhood mutual information is used to evaluate the relevance between genes in this work. Firstly, an improved Relief feature selection algorithm is proposed to create candidate fea...
متن کاملFuzzy and Rough Set Theory Based Gene Selection Method
The selection of genes from microarray gene expression datasets has become an important research in cancer classification because such data typically consist of a large number of genes and a small number of samples. In this work, Neighborhood mutual information is retrieved to evaluate the relevance between genes and is used to stop information loss. Firstly, an improved Relief Feature Selectio...
متن کاملOn fuzzy-rough attribute selection: Criteria of Max-Dependency, Max-Relevance, Min-Redundancy, and Max-Significance
Attribute selection is one of the important problems encountered in pattern recognition, machine learning, data mining, and bioinformatics. It refers to the problem of selecting those input attributes or features that are most effective to predict the sample categories. In this regard, rough set theory has been shown to be successful for selecting relevant and nonredundant attributes from a giv...
متن کاملA New Maximum-Relevance Criterion for Significant Gene Selection
Gene (feature) selection has been an active research area in microarray analysis. Max-Relevance is one of the criteria which has been broadly used to find features largely correlated to the target class. However, most approximation methods for Max-Relevance do not consider joint effect of features on the target class. We propose a new MaxRelevance criterion which combines the collective impact ...
متن کاملA Nadir Compromise Programming for Supplier Selection Problem under Uncertainty
Supplier selection is one of the influential decisions for effectiveness of purchasing and manufacturing policies under competitive conditions of the market. Regarding the fact that decision makers (DMs) consider conflicting criteria for selecting suppliers, multiple-criteria programming is a promising approach to solve the problem. This paper develops a nadir compromise programming (NCP) model...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017